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Attorney Docket No. 851663.479USPC 
APPARATUS TO IMPLEMENT DUAL HASH ALGORITHM 

BACKGROUND OF THE INVENTION 

Field of the Invention 

The present invention relates to an efficient hardware implementation 
5 of the Secure Hash Algorithm (SHA-1 ) and Message Digest Algorithm (MD5). 

Description of the Prlef- Related Art 

Hash algorithms and message digests are frequently used in 
applications such as digital signatures, where it is desirable to verify the 
authenticity of a document or file. Techniques for producing message digests are 

10 beneficial as they reduce the amount of data processing needed to a manageable 
and consistent level. 

The Secure Hash algorithm (SHA-1 ) is specified in Secure Hash 
Standard (FIPS PUB 1 80-1 )t and is an algorithm wh i ch that operates on an input 
data file to produce a condensed representation of that file. Specifically, etPt-a 

1 5 message of arbitrary length is processed to produce a message digest consisting 
of exactly 160 bits. 

The Message Digest Algorithm (MD5), developed by Professor 
Ronald Rivest, has a similar function. It accepts inputs of arbitrary length and 
produces an output message digest consisting of exactly 128 bits. 

20 Both algorithms may be used as a constituent part of a digital 

signature application. Both algorithms are computationally intensive and, when 
implemented in software, as is the norm in prior af^systems, can take a great 
number of processor clock cycles to complete. 
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BRIEF SUMMARY OF THE INVENTION 

The disclosed embodiments of the present invention therefore aims 
aim to overcome problems with the pr i or art implementation of these systems, 
particularly in relation to speed of operation and power consumption. 

5 SUMMARY OF THE PRESENT I NVENTION 

In a first broad form of one embodiment of the present invention^ 
provides an apparatus arranged to accept digital data as an inputv and to process 
sai4-the_data according to one of either the Secure Hash Algorithm (SHA-1 ) or 
Message Digest (MD5) algorithm to produce a fixed length output wordi is 
1 0 provided. s aid -The apparatus inc l ud i nq: includes a plurality of rotational registers 
for storing data, one of seid-the registers being arranged to receive the input datat^ 
and data stores for i n i t i a li Gat lo n initlalization of some of said-the plurality of registers 
according to whether the SHA-1 or MD5 algorithm is used, said-the data stores 
including fixed data relating to SHA-1 and MD5 operationt^ and a plurality of 
15 dedicated combinatorial logic circuits arranged to perform logic operations on data 
stored in selected ones of said-the plurality of registers. 

Preferably, the register arranood to rocolvo that receives the input 
data is arranged to receive sakl-the_input data serially. 

Preferably, the registers and combinatorial logic circuits are 
20 interconnected for communication via a pair of data busses. It is particulariy 

preferable if the registers and combinatorial logic circuits are connected to write to 
a respective bus via respective tristate buffers. 

Preferably, the apparatus includes a control circuit arranged to 
generate individually gated clock signals for each register. This results in lower 
25 power consumption as only active registers need to be clocked. 

Preferably, the control circuit is further arranged to generate 
individual enabling signals to control the tristate buffers. The control circuit may be 
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implemented as a dedicated state machinev or by anotlier means sucli as a 
microcontroller. 

Preferably, the rotational registers are arranged to be multiplexed 
prior to connection to a tristate buffer. This results in a lower bus loading. 
5 Preferably, the combinatorial logic circuits include a copy circuit, a 

shift left circuit, a NOT circuit, an ADD circuit, an OR circuit, an AND circuit and an 
XOR circuit. Each circuit is dedicated to its particular task, avoiding redundancy. 

Preferably, the apparatus is implemented as an integrated circuit, 
typically of the ASIC type. The apparatus may be incorporated with other 

10 apparatus, typically digital signature apparatus. 

Embodiments of the present invention trtitise utilize the fact that both 
algorithms may be broken down into a series of individual steps. Prior af=t 
approaches to implementing the algorithms in software do not ut ili s e utilize any 
spoc i a l isod specialized hardware components, which results in a relatively slow 

15 process. However, embodiments of the invention identify, where possible, the 
common elements between the MD5 and SHA-1 algorithms and provide 
spoc i alisod specialized hardware components to achieve the required functionality. 
Hardware is selected wh i ch a l lows to allow for the maximum sharing of 
components and hence the minimum overall component count. 

20 Embodiments of the present invention allow a relatively small number 

of dedicated components to be used in a circuit to efficiently calculate either MD5 
or SHA-1 message digests. Since the operations involved in both algorithms are 
similar, the circuit can be optim i sed to a ll ow optimized such that components wh i ch 
that are common to both algorithms te-be-are_provided only once. Allowing either 

25 of the algorithms to be used in calculating a message digest is advantageous as 
there are several digital signature systems operational wWeh-thaLmake use of one 
or other of the SHA-1 or MD5 algorithms. Systems trtltistfl§ that utilize an 
embodiment of the invention wilLbenefit from increased flexibility and speed. 
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In accordance with another embodiment of the invention, a circuit is 
provided that includes a plurality of data storage registers for storing data to be 
processed, a plurality of shift registers for temporary data storage, a plurality of 
locic circuits for performino operations on data, and a control circuit configured to 
5 control the data storage registers, the circular shift registers, and the logic circuits 
to selectivelv perform MD5 and SHA-1 operations on data- 
in accordance with another aspect of the foregoing embodiment, a 
plurality of initialization storage registers are provided to store and output 
initialization data. 

10 In accordance with a further embodiment of the invention, a dual 

hash aloorithm circuit is provided that includes a first bank and a second bank of 
data storage registers, a first bank and a second bank of circular shift registers, 
including at least one register to receive a data stream as input to the circuit, a 
bank of initialization data registers, a bank of temporary data registers, a plurality 

15 of combinatorial logic circuits, a read bus and a write bus, a control system for 
selectivelv coupling and uncoupling the first bank and second bank of data storage 
registers, the first bank and second bank of circular shift registers, the bank of 
initialization data registers, the bank of temporary data registers, and the plurality 
of combinatorial logic circuits to the read bus and the write bus to selectively 

20 perform MD5 and SHA-1 operations on the data to output data of a fixed length in 
accordance with the selected MPS and SHA-1 operations. 

BRIEF DESCRIPTION OF THE SEVERAL VIEWS OF THE DRAWINGS 
For a better understanding of the present invention and to 
understand how the same may be brought into effect, the invention will now be 
25 described by way of example only, with reference to the appended drawings In 

which: 

Figure 1 shows a view of the architecture of the combined SHA-1 
and MD5 processor; and 
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Figure 2 shows a view of the control circuit used to control the 
architecture s hown i n of Figure 1 . 

DETAILED DESCRIPTION OF THE PREFERRED EMBOD I MENTS INVENTION 
Figure 1 shows a custom i scd customized architecture which is 
5 arranged to receive a data input 150, process it using the shown elements, and 
produce a data output 155. The hardware shown is able to perform either SHA-1 
or MD5 processing on the input data, and has been opt i m i sod optimized in order to 
min i m is G minimize the amount of hardware needed to perfomri either one of the 
algorithms. 

10 The circuit includes a plurality of registers for storing data. There are 

ten registers provided in two banks 110, 1 15 for storing part of the data being 
processed. In addition, two temporary registers 120, 135 are provided for 
intermediate processing and temporary storage. Also provided are two banks 125, 
1 30 of circular shift registers W1 5[31 :0] - W0[31 :0]. Register W1 5 of bank 1 25 is 

1 5 arranged to receive the input data 1 50. Any data held in W1 5 at that time is shifted 
to W14; the data in W14 is shifted to W13 and so on, until the data held in WO is 
lost. The outputs of banks 125 and 130 are multiplexed before being attached to 
the read bus 140 by a tristate buffer in order to reduce bus loading. 

The registers are mutually interconnected for communication via a 

20 read bus 140 and a write bus 145. 

The read bus 140 is connected to a range of logic circuits which 
provide combinatorial functions. These functions are: Copy (CP) 200, Shift Left 
multiple positions (SL*) 205. NOT 210, ADD 215, OR 220, AND 225, XOR 230 and 
Shift Left one position (SL1) 235. Functions 200, 205, 210 require only a single 

25 input variable and receive it directly from the read bus 140. The other functions 
215, 220, 225 and 230 require two input variables and receive one from the read 
bus 140 and the other from a-the temporary register (ACCU[31 :0]) 135. Register 
135 also provides the input for shift register 235. 
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Also connected to the read bus via a nnultiplexer and a tristate logic 
gate is a bank 160 of registers including fixed constants used in the 
i n i t i al l Gation lnitialization of the circuit for either SHA-1 or MD5 mode calculations. 
K[t] is provided for i n i t i a l iGQt i on lnitialization of SHA-1 , and T[i] is provided for 
5 i n i tial i oat i on initialization of MD5. In total, approximately seventy five constants 
each having a length of 32 bits are required, and grouping them together in this 
fashion allows them to be conveniently accessed. The synthesis tool which places 
the gates in the finished custom device is then able to opt i m iG c optimize the logic, 
resulting in a smaller gate count, and thus a smaller area of silicon es-js_required. 

1 0 Calculation of either SHA-1 or MD5 requires the use of selected ones 

of the provided registers and combinatorial functions. In particular, calculation of 
the SHA-1 algorithm uses all the registers of bank 1 10 and of bank 115. 
Calculation of MD5 requires only the use of four registers (HO - H3) of bank 110 
and four registers (A-D) of bank 115. This allows the unused registers to be used 

1 5 for temporary storage if required. However, when the result of the calculation 1 55 
is unloaded from register HO of bank 110, all five registers are read since they are 
implemented as shift registers, and this ensures that their contents are unchanged. 

All devices whteh- that can output data to the read bus 140 are 
connected to the bus via a tristate buffer. Each buffer is individually enabled via a 

20 control signal created by the control circuit shown in Figure 2. Likewise, the 
combinatorial functions 200-235 which can write data onto the write bus 145 are 
connected to the write bus 145 via individually controllable tristate buffers. 

The group of clock signals 345 to individual registers are created 
from a master clock signal 340. The master clock signal is ANDed with a control 

25 signal to create a gated clock signal for the appropriate register. In this way, the 
energy consumption of the complete circuit is reduced as -because only active 
registers need to be clocked. 

Figure 2 shows ^Fte-a top level view of the control circuit 400 which 
generates the various control signals for the circuit of Figure 1 . In particular, it 
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generates, from a master clock signal 340, a series 345 of gated Individual clock 
signals vyhiel=H that are used to clock the various registers of Figure 1 . It also 
generates individual enable signals for each of the tristate buffers shown in 
Figure 1 . The control circuit mat -may t ake the form of a finite state machine 
5 including associated controlling circuits. 

The following pseudo-code represents the steps performed in 
calculating a message digest according to the SHA-1 algorithm on an input data 
word of arbitrary length. 

The high level algorithm details In broad terms the steps taken in 
10 performing a calculation according to the SHA-1 algorithm. The following more 
detailed code provides step by step instructions on performing the Individual 
Instructions needed to calculate the message digest. 

SHA-1 Algorithm 

15 // High Level Algorithm 

initialize SHA-1 internal registers (H0,H1,H2,H3,H4) 
for.each Mi, block of 512 bits of M do 
load Mi into data registers W[0] to W[15] 
20 start core SHA-1 
end 

unload H0.H1,H2,H3,H4 

25 //Detailed steps 

SHA-1 initialization 
HO = 67452301 
HI = EFCDAB89 
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H2 = 98BADCFE 
H3 = 10325476 
H4 = C3D2E1F0 

5 Core SHA-1 

A=HO, B=H1, C=H2, D=H3. E=H4 
MASK=OOOOOOOF 
for t=0 to 79 do 
s = t and MASK; 
1 0 if (t>=1 6) W[s] = SL1{ W[(s + 1 3) and MASK] xor 
W [(s + 8) and MASK] xor 
W [(s + 2) and MASK] xor W[s]); 
end if 

TEMP = SL5{A)+ /if(B,C,D)+E +W[s]+K[t] 
1 5 E=D. D=C. C=S/.30(B), B=A, A=TEMP 
end for 

HO =HO+A, H1 =H1+B, H2 =H2+C, H3=H3+D. H4=H4+E 

// The functions SL1 , SL5 and SL30 are circular left rotation 
20 // of the 32 bit operand by 1 bit, 5 bits and 30 bit 
// respectively. 

// The constants Kt are defined by the following: 

Kt = 5A82 7999(0<=t<= 19) 
25 Kt = 6ED9 EBA1 (20 <= t <= 39) 
Kt = 8F1 B BCDC (40 <= t <= 59) 

Kt = CA62 C1 D6 (60 <= t <= 79). 
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// The functions ft(B,C,D) is defined by the following: 
ft (B.C.D) = (B and C) or ((not B) and D) (0 <= t <= 19) 
ft (B.C.D) = B xor C xor D (20 <= t <= 39) 
ft (B.C.D) = (B and C) or (B and D) or (C and D) (40 <= t <= 59) 
5 ft (B.C.D) = B xor C xor D (60 <= t <=79). 



The following pseudo-code represents the steps performed in 
calculating a message digest according to the MD5 algorithm on an input data 
word of arbitrary length. 

10 

N05 Pseudo Algorithm 

// Here, the four auxiliary functions that each take as input 
// three 32-bit words and produce as output one 32-bit word 
// are defined: 

15 

F (X, Y, Z) = (X and Y) or (not (X) and Z) 
G(X,Y,Z) = (X and Z) or (Y and not(Z)) 
H(X,Y,Z) = XxorYxorZ 
l(X,Y,Z) = Yxor(Xornot(Z)) 

20 

// A 64-element table T[1 ... 64] constructed from the sine 
// function is defined. Let T[i] denote the i-th element of 
// the table, which is equal to the integer part of 4294967296 
// times abs(sin(i)), where i is in radians. 

25 

High Level Algorithm 

initialize MD5 internal registers (H0,H1,H2,H3) 
for each Mi, block of 512 bit of M do 
load Mi into data registers W[0] to W[15] 
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start core MD5 
end 

unload HO, H1, H2, H3 

5 

MD5 initialization 
HO = 67 45 23 01 
H1 =of cd ab89 
H2 = 98 ba dc fe 
10 H3 = 10 32 54 76 

Core MD5 

A=HO, B=H1, C=H2, D=H3 

15 //Round 1. 

// Let (abcxi k s i) denote the operation 

// a = b + ((a + F(b,c,d) + W[k] + T[l]) «< s). 

// Do the following 1 6 operations. 

20 [ABCD 0 7 1] [DABC 112 2] [CDAB 2173] [BCDA 3 22 4] 
[ABCD 4 7 5] [DABC 5 12 6] [CDAB 6 17 7] [BCDA 7 22 8] 
[ABCD 8 7 9] [DABC 9 12 10] [CDAB 10 17 11] [BCDA 11 22 12] 
[ABCD 12 7 13] [DABC 13 12 14] [CDAB 14 17 15] [BCDA 15 22 16] 

25 // Round 2. 

// Let (abed k s i] denote the operation 

// a = b + ({a + G(b,c.d) + W[k] + T[i]) «< s). 

// Do the following 16 operations. 
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[ABCD 15 17] [DABC 6 918] [CDAB 111419] [BCDA 0 20 20] 
[ABCD 5 5 21] [DABC 10 9 22] [CDAB 15 14 23] [BCDA 4 20 24] 
[ABCD 9 5 25] [DABC 14 9 26] [CDAB 3 14 27] [BCDA 8 20 28] 
[ABCD 13 5 29] [DABC 2 9 30] [CDAB 7 14 31] [BCDA 12 20 32] 

5 

// Round 3. 

// Let [abed k s i] denote the operation 

// a = b + ((a + H(b,c,d) + W[k] + T[i]) «< s). 

// Do the following 16 operations. 

10 

[ABCD 5 4 33] [DABC 8 1 1 34] [CDAB 1 1 16 35] [BCDA 14 23 36] 
[ABCD 1 4 37] [DABC 4 1 1 38] [CDAB 7 16 39] [BCDA 10 23 40] 
[ABCD 13 4 41] [DABC 0 1 1 42] [CDAB 3 16 43] [BCDA 6 23 44] 
[ABCD 9 4 45] [DABC 12 1 1 46] [CDAB 15 16 47] [BCDA 2 23 48] 

15 

// Round 4. 

// Let [abed k s i] denote the operation 

// a = b + ((a + I (b, e, d) + W [k] + T [i]) «< s). 

// Do the following 16 operations. 

20 

[ABCD 0 6 49] [DABC 7 10 50] [CDAB 14 15 51] [BCDA 5 21 52] 
[ABCD 12 6 53] [DABC 3 10 541 [CDAB 10 15 55] [BCDA 1 21 56] 
[ABCD 8 6 57] [DABC 15 10 58] [CDAB 6 15 59] [BCDA 13 21 60] 
[ABCD 4 6 61] [DABC 1 1 10 62] [CDAB 2 15 63] [BCDA 9 21 64] 

25 

HO =H0+A, HI =H1+B, H2 =H2+C. H3=H3+D 

The information below sets out the so-called atomie operations which 
are required to perform the different algorithm calculations. The following steps 
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indicate tine operation number, the operation performed, and the status of the read 
140 and write 145 busses. Each operation listed below takes exactly one clock 
cycle. 

5 SHA-1 ALGORITHM 
initialization 



0<=t< 



## 


operation 


Readbus 


Writebus 


01. 


A:= HO 


HO 


(copy) 


02. 


B:=H1 




HI 


(copy) 


03. 


C:= H2 


H2 


(copy) 


04. 


D:= H3 


H3 


(copy) 


05. 


E := H4 


H4 


(copy) 


=15 










ft li 


operation 


r\cdUUUS 


vvnieuus 


01 


ACCU 


= B 


D 
D 


vcopy; 


02 


TMP:= 


ACCU and C 


P 


\a\ lU ) 


03 


ACCU 


= NOTB 


R 
D 




04 


ACCU 


= ACCU and D 


n 




05. 


ACCU 


= ACCU or TMP 


TMP 


(or) 


06. 


ACCU 


= ACCU + W[0] 


W[0] 


(+) 


07. 


ACCU 


= ACCU + E 


E 


(+) 


08. 


TMP := 


SL5(A) 


A 


(SL5) 


09. 


ACCU 


= ACCU + TMP 


TMP 


(+) 


10. 


TMP := 


ACCU + K[t] 


K[t] 


(+) 


11. 


E:=D 




D 


(copy) 


12. 


D:=C 




C 


(copy) 


13. 


C := SL30(B) 


B 


(SL30) 


14. 


B := A 




A 


(copy) 



12 




16<=i 



15. 


A := TMP 


TMP 


(copy) 


16. 


ROTATE W[i] 














H-H- 


operaiion 


KeauDus 


writeDus 


U 1 . 




D 


(copy) 




TMP — APPI 1 anH P 

1 ivin .— f\\j\j\j ana o 




(anO) 


03 


ACCU •= NOT B 


D 


fnnt\ 




APPI 1 ■= APPI 1 anH n 


n 


^dnu ) 




TMP ■- APPI 1 nr TMP 


TMP 


[or) 


OR 


APPI 1 — wn*^! 


VV[ 1 oJ 


(copy; 




APPI 1 •= APPI 1 Ynr Wfftl 


Vv[OJ 




OR 


APPI 1 •= APPI 1 Ynr \N\0^ 




(xor; 


no 


APPI 1 ■— &r^r^i 1 vnr lA/rm 
ALrUU .— AUUU XOr W[UJ 


W[UJ 


(xor) 


1 U. 


\A/rm •— oi ^ 
W[UJ .- oLI 


(AUUU) 


/CI -1 \ 

(SL1) 


1 1 
1 1 . 


APPI 1 — wrm 


wrm 

VV[UJ 


(copy; 


1 il. 


APPI 1 ■— APPI 1 J. T^ylD 
AULrU .— AUUU + 1 Mr 


1 Mr 


(+) 


1 o. 


APPI 1 — APPI 1 + P 




\^) 


14 


TMP •= *?] ^IA\ 


A 






APPI 1 •= APPI 1 + TMP 


TMP 


(+\ 


1fi 


TMP •= APPI ) + Kftl 






^ 7 




n 
u 


(copy; 


1 o. 


n — p 




(copy; 


19. 


C := SL30(B) 


B 


(SL30) 


20. 


B := A 


A 


(copy) 


21. 


A := TMP 


TMP 


(copy) 


22. 


ROTATE W[i] 







20<=t<=39 and 60<=t<=79 
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final round 



ittt 
Wft 


operaiion 


KeauDus 


Write bus 


f)1 


Am 1 •= A 


A 






nU .— n\jKj\J T nu 


nu 


(+) 




Am 1 •= R 


p 

D 




U'f. 


n 1 .— Moou T n 1 


m 


(+) 


\J\J . 








06. 


H2 := ACCU + H2 


H2 


(+) 


07. 


ACCU := D 


D 


(copy) 


08. 


H3 :=ACCU + H3 


H3 


(+) 


09. 


ACCU := E 


E 


(copy) 


10. 


H4 := ACCU + H4 


H4 


(+) 



MD5 ALGORITHM . 
initialization 



## 


operation 


Readbus 


Writebus 


01. 


A := HO 


HO 


(copy) 


02. 


B := HI 


HI 


(copy) 


03. 


C:= H2 


H2 


(copy) 


04. 


D:= H3 


H3 


(copy) 


Round 1 (16 


iterations): 0<=i<=15; k=0; s=7,12.1 7,22,7,1 2,1 7,22... 


## 


operation 


Readbus 


Writebus 


01. 


ACCU := B 


B 


(copy) 


02. 


TMP := ACCU and C 


C 


(and) 


03. 


ACCU := NOT B 


B 


(not) 


04. 


ACCU := ACCU and D 


D 


(and) 


05. 


TMP :=ACCU or TMP 


TMP 


(or) 


06. 


ACCU := W [k] 


W[k] 


(copy) 
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Vf. 


• 

AO^I 1 — AOr*! 1 J. A 


A 

A 


• 


Uo. 


AOUU .- AOUU + 1 [IJ 


xrii 
' UJ 


(+) 


flQ 


TMP •= Am 1 + TIWIP 

1 IVIi . — /AOOU • 1 IVIr 


TMP 




\ U. 


AL-UU .— oL[SJ^ 1 mr) 


TtiAD 

1 IVIr 


/ei roi\ 


1 1 . 


TMP — APPI 1 + R 


D 




1 9 

1 


A — n 

r\ .— U 


n 




1 O. 


n •= r 


n 


^copy; 


1 *t. 




D 
D 


^copy) 


15. 


B := TMP 


TMP 


(copy) 


16. 


ROTATE W[k] 







Preparation for Round 2 
01. ROTATE W[k] 

5 Round 2(16 iterations):16<=i<=31; k=1; s=5,9,1 4,20,5,9,1 4,20,5 



## 


operation 


Read bus 


Writebus 


01. 


ACCU 


= B 


B 


(copy) 


02. 


TMP := 


ACCU and D 


D 


(and) 


03. 


ACCU 


= NOTD 


D 


(not) 


04. 


ACCU 


= ACCU and C 


C 


(and) 


05. 


TMP := 


ACCU or TMP 


TMP 


(or) 


06. 


ACCU 


= W[k] 


W[k] 


(copy) 


07. 


ACCU 


=ACCU + A 


A 


(+) 


08. 


ACCU 


= ACCU + T[i] 


Tli] 


(+) 


09. 


TMP := 


ACCU + TMP 


TMP 


(+) 


10. 


ACCU 


= SL[s](TMP) 


TMP 


(SL[s]) 


11. 


TMP:= 


ACCU + B 


B 


(+) 


12. 


A := D 




D 


(copy) 


13. 


D:=C 




C 


(copy) 



15 




14 




D 


vcopy; 


1*5 
1 o. 


R •= TMP 


TMP 

1 Ivir 




1fi 


RDTATF Wrkl 






17. 


ROTATE W[k] 






18. 


ROTATE W[k] 






19. 


ROTATE W[k] 






20. 


ROTATE W[k] 







Preparation for Round 3 

01. ROTATE W[k] 

02. ROTATE W[k] 

03. ROTATE W[k] 

04. ROTATE W[k] 

Round 3(16 iterations) :32<=i<=47; k=5; s=4,1 1,1 6,23,4, 11, 16 

As an example of how the information above should be interpreted, 
5 step number 2 of the SHA-1 i n i t i al i oat i on initialization section relates to the 

operation B:= HI , meaning that the register 8 is set to the value stored in HI . To 
achieve this, the tristate buffer 321 of register HI and the tristate buffer 301 of the 
copy logic are enabled together. At the same time, the clock to register B is 
enabled, resulting in the data in HI being written into B. The tristate buffer control 
10 and clock signals are generated by the control circuit 400. 

Similarly, step number 10 in the SHA-1 0<= t <= 15 stage relates to 
the operation TIVIP := ACCU + K[t]. The multiplexer and tristate buffer 332 is 
enabled for K[10]. The tristate buffer 304 is enabled for the ADD logic 215 and a 
gated clock signal is created and applied to the TMP register 120. In this way, the 
1 5 rising clock signal causes the sum of the data in K[1 0] and ACCU to be written into 
the TMP register. 

The last instruction in the 0<= t <= 1 5 stage for SHA-1 (and the 0<= i 
<= 15 stage for MD5) causes the entire Wi chain to be rotated, so that W14 is 
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loaded with the data previously in W15, W13 receives the data previously in W14, 
and W15 receives the data previously in WO. Advantageously, this instruction may 
be implemented in parallel with the instruction above it (Step 1 5) as the rotate 
instruction does not involve placing data onto the data bus. In this way, one clock 
5 cycle per iteration is saved, leading to a total saving of 80 cycles for SHA-1 and 64 
cycles for MD5. 

The embodiment presented has a bus width of 32 bits. However, it is 
possible to reduce the bus width to reduce the silicon area of the design at the 
expense of operational speed. If the bus width is reduced to 16 bits, each 32 bit 
10 XOR operation, for example, will take two cycles rather than one cycle if a 32 bit 
bus was used. 

In tho li ght of tho forego i ng descr i pt i on, it wi ll bo clear to the sk ille d 
man that var i ous modifica tions may bo mode w i th i n the scope of the invention: 
The present invention includes and novel feature or combination of 
1 5 features disclosed herein either explicitly or any gonora li Gation aeneralization 

thereof irrespective of whether or not it relates to the claimed invention or mitigates 
any or all of the problems addressed. 

From the foregoing it will be appreciated that, althouoh specific 
embodiments of the invention have been described herein for purposes of 
20 illustration, various modifications mav be made without deviating from the spirit 
and scooe of the invention. Accordinolv. the invention is not limited except as bv 
the appended claims and the equivalents thereof. 
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